library(WDI)
library(sf)
library(tmap)
library(rnaturalearth)
library(rnaturalearthdata)
library(tidyverse)
library(tidycensus)
library(raster)
library(exactextractr)
library(terra)

1 Introduction

In the course of working on a GIS project, you may have to perform certain tasks multiple times. For example, you may want to implement a spatial join (a procedure we learned about last class) between a point layer and several different polygon datasets that reflect different geographic scales. Or, you may want to quickly make multiple maps, to get a sense of how different variables are distributed across space before beginning a more in-depth study.

Dedicated GIS applications like ArcGIS and QGIS have built-in scripting windows that allow users to write Python scripts that can automate repetitive procedures. In addition, for users unfamiliar with Python, ArcGIS provides tools like Model Builder, and embeds batch-processing functions embedded in its menu-bused interface.

In this lesson, we’ll become familiar with some basic scripting tools that can help you to automate some of the tasks we have already covered in the course with a minimal amount of programming. These tools will come in handy if you decide to work with GIS in a longer-term project and you want to save time on your geoprocessing tasks.

2 Functions and Iteration

Before diving into some GIS-specific examples, let’s become familiar with some basic concepts and processes using a simple example. This section demonstrates how to write a basic function with a single argument/input, and then iteratively pass several different arguments to that function.

2.1 Writing functions with a single argument: A simple (non-GIS) example

Let’s say you have a large collection of temperature data, measured in Fahrenheit, and you want to convert these data to Celsius. Recall that the formula to convert between Fahrenheit and Celsius is as follows:

# fahrenheit to Celsius formula, where "F" is fahrenheit input
(F-32)*(5/9)
## [1] -17.77778

At its most basic level, R is a calculator; if for example, one of our Fahrenheit measurements is 55 degrees; we can convert this to Celsius by plugging 55 into the conversion formula:

# Converts 55 degrees fahrenheit to Celsius
(55-32)*(5/9)
## [1] 12.77778

This is easy enough, but if we have a large amount of temperature data that requires processing, we wouldn’t want to carry out this calculation for each measurement in our data collection. The first step in allowing us to carry out this conversion operation at scale is to write a function, which is simply a programming construct that takes a set of inputs (also known as arguments), manipulates those inputs/arguments in a specific way (the body of the function), and returns an output that is the product of how those inputs are manipulated in the body of the function. It is much like a recipe, where the recipe’s ingredients are analogous to a function’s inputs, the instructions about how to combine and process those ingredients are analogous to the body of the function, and the end product of the recipe (for example, a cake) is analogous to the function’s output.

Let’s see how we can wrap the Fahrenheit-Celsius formula above into a function:

fahrenheit_to_celsius_converter<-function(fahrenheit_input){
  celsius_output<-(fahrenheit_input-32)*(5/9)
  return(celsius_output)
}

Let’s unpack the code above, which we used to create our function:

  • We declare that we are creating a new function with the word function; within the parenthesis after function, we specify the function’s argument(s). Here, the function’s argument is an input named fahrenheit_input. The name of the argument(s) is arbitrary, and can be anything you like; ideally, its name should be informed by relevant context. Here, the argument/input to the function is a temperature value expressed in degrees Fahrenheit, so the name “fahrenheit_input” describes the nature of this input.
  • After enclosing the function’s arguments within parentheses, we print a right-facing curly brace {, and then define the body of the function (i.e. the recipe), which specifies how we want to transform this input. In particular, we take fahrenheit_input, subtract 32, and then multiply by 5/9, which transform the input to the celsius temperature scale. We’ll tell R to assign this transformed value to a new object, named celsius_output.
  • In the function’s final line, return(celsius_output), we specify the value we want the function to return. Here, we are saying that we want the function to return the value that was assigned to celsius_output. We then close the function by typing a left-facing curly brace below the return statement }.
  • Just as we can assign data or visualizations to objects that allow us to subsequently retrieve the outputs of our code, so too with functions. Here, we’ll assign the function we have just return to an object named fahrenheit_to_celsius_converter.

After running that code, we can use the newly created fahrenheit_to_celsius function to perform our Fahrenheit to Celsius transformations. Let’s say we have a Fahrenheit value of 68, and want to transform it to Celsius:

fahrenheit_to_celsius_converter(fahrenheit_input=68)
## [1] 20

Above, we passed the argument “fahrenheit_input=68” to the fahrenheit_to_celsius_converter function that we created; the function then took this value (68), plugged it into “fahrenheit_input” within the function and assigned the resulting value to “celsius_output”; it then returned the value of “celsius_output” (20) back to us.

Let’s try another one:

fahrenheit_to_celsius_converter(fahrenheit_input=22)
## [1] -5.555556

In short, we can specify any value for the “fahrenheit_input” argument; this value will be substituted for “fahrenheit_input” in the expression celsius_output<-(fahrenheit_input-32)*(5/9), after which the value of celsius_output will be returned to us.

2.2 Iteration with a single-input function

Using this newly created function helps us to avoid manually converting each of our temperature values from the Fahrenheit scale to the Celsius scale; instead of repeating the calculation over and over manually, we could simply plug our Fahrenheit temperature values into the function, and let the function carry out the calculation for us. However, it is still time-consuming to plug our Fahrenheit values into the function one-by-one. For example, let’s say we have a vector of Fahrenheit temperature values; below, we’ll assign these values to an object named fahrenheit_input_vector:

fahrenheit_input_vector<-c(45.6, 95.9, 67.8, 43)

If we wanted to convert all of these Fahrenheit values to the Celsius scale, we could do so individually, i.e. 

fahrenheit_to_celsius_converter(fahrenheit_input=45.6)
## [1] 7.555556

And so on.

However, we can also iteratively apply our function to all of these vector elements, and deposit the transformed results into a new vector. In programming languages, functions are typically applied to multiple inputs in an iterative fashion using a construct known as a for-loop, which some of you may already be familiar with. R users also frequently use specialized functions (instead of for-loops) to iterate over elements; this is often faster, or at the very least, makes R scripts more readable. One family of these iterative functions is the “Apply” family of functions. A more recent set of functions that facilitate iteration is part of the tidyverse, and is found within the purrr package. These functions known as map() functions, and we will use them here to iteratively apply our functions to multiple inputs (the “map” label might be confusing when working in a GIS setting where you might be making actual maps, i.e. spatial visualizations; however, it should be clear from the context whether we are referring to map() functions within purrr, or to actual maps).

Let’s see how we can use a map() function to sequentially apply the fahrenheit_to_celsius_converter() function we created to several different values for the “fahrenheit_input” argument, contained in fahrenheit_input_vector. We’ll pass fahrenheit_input_vector as the first argument to the map_dbl() function, and fahrenheit_to_celsius_converter (i.e. the function we want to apply iteratively to the elements in `thefahrenheit_input_vector ) as the second argument. The result of this operation will be a new “results vector”, containing the transformed temperature values for each input in the original vector of Fahrenheit values (fahrenheit_input_vector). We’ll assign this result/output vector to a new object named celsius_outputs_vector:

celsius_outputs_vector<-map_dbl(fahrenheit_input_vector, fahrenheit_to_celsius_converter)

In short, the code above takes ``fahrenheit_input_vector(i.e. a vector with the numbers 45.6, 95.9, 67.8, 43), and runs each of these numbers through thefahrenheit_converter()function, and sequentially deposits the transformed result to the newly createdcelsius_outputs_vector()``` object, which contains the following elements:

celsius_outputs_vector
## [1]  7.555556 35.500000 19.888889  6.111111

More explicitly, the code that reads celsius_outputs_vector<-map_dbl(fahrenheit_input_vector, fahrenheit_converter) did the following:

  1. Pass 45.6 (the first element in the input vector, fahrenheit_input_vector) to the fahrenheit_converter() function, and place the output (7.555556) as the first element in a new vector of transformed values, named celsius_outputs_vector.
  2. Pass 95.9 (the second element in the input vector, fahrenheit_input_vector) to the fahrenheit_converter() function, and deposit the output (35.500000) as the second element in celsius_outputs_vector.
  3. Pass 67.8 (the third element in the input vector, fahrenheit_input_vector) to the fahrenheit_converter() function, and deposit the output (19.888889) as the third element in celsius_outputs_vector.
  4. Pass 43 (the fourth element in the input vector, fahrenheit_input_vector) to the fahrenheit_converter() function, and deposit the output (6.111111) as the fourth element in celsius_outputs_vector.

If we want to extract an element from the output vector, we can do so by specifying its index within brackets. For instance, if we wanted to extract the second element in celsius_outputs_vector, we could type the following:

celsius_outputs_vector[2]
## [1] 35.5

There are a variety of map functions, and the precise one you should use turns on the number of arguments used by the function (here, this value is of course one), and the desired class of the output (i.e. numeric vector, character vector, data frame, list etc.). Below, we’ll talk more about how to handle functions with multiple arguments within the purrr ecosystem. Before that, though, let’s see how to use a slightly different type of map function to return a different kind of output.

First, let’s say we want to iteratively pass the input values from fahrenheit_input_vector as arguments to fahrenheit_converter(), but that we want to return the output values in a list, rather than a vector (as above). To do so, we pass our input vector (fahrenheit_input_vector), and the function we want to iteratively apply to the elements of the input vector (fahrenheit_converter) to the map() function. We’ll assign the output list to a new object named celsius_outputs_list:

celsius_outputs_list<-map(fahrenheit_input_vector, fahrenheit_to_celsius_converter)

Let’s now print the contents of our list; note that each of our transformed Celsius temperature values is now a separate list element within celsius_outputs_list:

celsius_outputs_list
## [[1]]
## [1] 7.555556
## 
## [[2]]
## [1] 35.5
## 
## [[3]]
## [1] 19.88889
## 
## [[4]]
## [1] 6.111111

We can extract specific list elements by specifying the index number of the list element in double-brackets after the name of the list object (note that this is slightly different from extracting the element of a numeric vector, for which we use a single bracket, as demonstrated earlier). For example, if we want to extract the second element of celsius_outputs_list, we can use the following:

celsius_outputs_list[[2]]
## [1] 35.5

A handy tidyverse function, called pluck() can also be used to extract list elements; for example, we can extract the second list element from celsius_outputs_list with pluck() with the following:

celsius_outputs_list %>% pluck(2)
## [1] 35.5

If we want to organize our information in a data frame, the first step is to slightly modify our function to return a data frame:

fahrenheit_to_celsius_converter_df<-function(fahrenheit_input){
  celsius_output<-(fahrenheit_input-32)*(5/9)
  celsius_output_df<-data.frame(fahrenheit_input, celsius_output)
  return(celsius_output_df)
}

Now, let’s test this function for a single value:

fahrenheit_to_celsius_converter_df(44)
##   fahrenheit_input celsius_output
## 1               44       6.666667

Let’s say we want to assemble a dataset using multiple Fahrenheit input values, where one column consists of these input values, and the second column consists of the corresponding Celsius outputs? To that end, we can use the map_dfr() function, which is part of the purrr package’s suite of map() functions. Below, we’ll use the same fahrenheit_input_vector from above as our input vector, and pass this vector as the first argument to the map_dfr() function; the second argument is the name of the function to which we want to apply these input values, namely, fahrenheit_to_celsius_converter_df(). We’ll assign the dataset to an object named celsius_outputs_df:

celsius_outputs_df<-map_dfr(fahrenheit_input_vector, fahrenheit_to_celsius_converter_df)

Let’s now print the contents of celsius_outputs_df:

celsius_outputs_df
##   fahrenheit_input celsius_output
## 1             45.6       7.555556
## 2             95.9      35.500000
## 3             67.8      19.888889
## 4             43.0       6.111111

We now have a dataset with one column consisting of our Fahrenheit inputs (taken from fahrenheit_input_vector), and a second column consisting of our Celsius outputs (derived by applying the fahrenheit_to_celsius_converter_df() function to our vector of input values, `fahrenheit_input_vector).

We’ve just covered three different purrr functions: map() (which returns a list), map_dbl() (which returns a vector), and map_dfr() (which returns a dataframe). There are other map functions which return different types of objects; you can see a list of these other map functions by inspecting the documentation for the map() function with ?purrr::map().

2.3 Iteration with functions that have two inputs

In the previous subsection, we explored some basic functions from the purrr package’s map() family, which are used to iteratively apply a given function to a set of inputs, and then return a set of outputs (i.e. the results of applying that function) that is contained in an object (the type of object, i.e. list, vector etc. is determined by the type of map function that is used). The functions we were iteratively applying in the examples above were single-input functions, but functions can (and often do) require multiple inputs.

In this subsection, we’ll explore iteration with map2() functions, which are used for iteration in cases where the relevant function has two inputs. First, let’s define a simple 2-input function. In this example, we’ll define a function that takes export and import values as inputs, and returns a value for net exports (defined as the difference between total exports and total imports), and assign the function to an object named net_exports_calculation():

net_exports_calculation<-function(exports, imports){
  net_export_value<-exports-imports
  return(net_export_value)
}

Let’s now test the function for a single case, with arbitrary input values, denominated in units of $1 million. We’ll test the function for a hypothetical case where exports are $133 million (exports=133) and imports are $55 million (imports=55):

net_exports_calculation(exports=133, imports=55)
## [1] 78

The function, as expected, returns a net export value of $78 million.

Now, let’s say we have export and import data for several countries, and want to calculate net exports for all of these countries by iteratively applying net_exports_calculation() to all of our data. The first country has exports of $78 million and imports of $134 million; the second has exports of $499 million and imports of $345 million; and the third country has exports of $785 million and imports of $645 million. How can we apply net_exports_calculation() to each of these countries, and calculate a net export value for each one?

First, we’ll create vectors that contain our import and export values. Below, a numeric vector containing each country’s export values is assigned to an object named export_vector, and a numeric vector containing each country’s import values is assigned to an object named import_vector:

export_vector<-c(78, 499, 785)
import_vector<-c(134, 345, 645)

Now, let’s cycle these vectors through the net_exports_calculation()```` function, and deposit the resulting export values into a list, which we'll namenet_export_list. Instead of using themap()function (as above), we'll use a function namedmap2()```, since the function we’re applying has two inputs.

Below, the first argument to map2(), export_vector````, is the vector we defined above which contains our export values; the second argument,import_vector, is our vector of import values; and the third argument,net_exports_calculation```, is the function we want to apply.

The map2() function will take the first element of export_vector and the first element of import_vector, run those values through net_exports_calculation, and deposit the result (i.e. the net export value for the first country) as the first element in the net_export_list list object. Then, it will take the second element of export_vector and the second element of import_vector, run those values through net_exports_calculation, and deposit the result (i.e. the net export value for the first country) as the second element in the net_export_list list object. Finally, it will go through the same process for the third country.

net_export_list<-map2(export_vector, import_vector, net_exports_calculation)

Let’s now print the contents of net_export_list, which contain our net export values:

net_export_list
## [[1]]
## [1] -56
## 
## [[2]]
## [1] 154
## 
## [[3]]
## [1] 140

If, instead of depositing the results into a list, we’d like to deposit our outputs into a numeric vector, we can do so using the map2_dbl() function, the analog of map_dbl() which is used when the function takes two inputs rather than one. We’ll assign our results vector to a new object named net_export_vector:

net_export_vector<-map2_dbl(export_vector, import_vector, net_exports_calculation)

Let’s now print the contents of net_export_vector:

net_export_vector
## [1] -56 154 140

Note that the net export values contained in net_export_vector are the same as those in net_export_list, as we would expect; the only difference is that net_export_vector is a vector and net_export_list is a list.

Just as map2() and map2_dbl() serve as analogs to map() and map_dbl() for functions with two inputs, map2_dfr() is the analog to map_dfr() for such functions. To see how it works, let’s first define a function that returns a dataframe, with the first column consisting of the export value, the second consisting of the import value, and the final column consisting of the net export value that is calculated within the body of the function. We’ll assign this function to a new object named net_exports_calculation_df():

net_exports_calculation_df<-function(exports, imports){
  net_exports<-exports-imports
  df<-data.frame(exports, imports, net_exports)
  return(df)
}

Now, let’s test the net_exports_calculation_df() function we’ve just defined using an export value of $100 millions (exports=100) and an import value of $40 million (imports=40 million):

net_exports_calculation_df(exports=100,imports=40)
##   exports imports net_exports
## 1     100      40          60

Having confirmed that the function works as expected, let’s use the map2_dfr() function to iteratively apply the net_exports_calculation_df() function to the export values contained in export_vector and the import values contained in import_vector.

The code below takes the first element in export_vector, and the first element in import_vector, and runs these input values through net_exports_calculation_df() to create the first row of the output data frame (where the first column is the first element of export_vector, the second column is the first element of import_vector, and the third column is the net export value calculated using these values). It then takes the second element in export_vector, and the second element in import_vector, and runs these input values through the net_exports_calculation_df() function to create the second row of the output data frame. Finally, it repeats the process for the third elements of export_vector and export_vector, and creates the third row of the output data frame. We’ll assign the data frame to a new object named net_exports_dataframe:

net_exports_dataframe<-map2_dfr(export_vector, import_vector, net_exports_calculation_df)

Let’s now print the contents of our newly created data frame object:

net_exports_dataframe
##   exports imports net_exports
## 1      78     134         -56
## 2     499     345         154
## 3     785     645         140

2.4 Iteration with functions that have more than two inputs

While the map2() family functions allows us to conveniently handle iteration tasks involving two-input functions, you will often need to work with functions with more than two inputs. How can we carry out iteration tasks with these multi-input functions?

The pmap() family of functions within purrr allows us to handle iteration tasks using functions with any number of inputs greater than two, by using a list as a container for all of the inputs we would like to iterate over.

To see how this works, let’s first define a function with more than two arguments. In particular, we’ll create a function that takes numeric values for consumption spending (consumption_spending), government spending (government_spending), investment spending (investment_spending), and net exports (net_exports) as arguments, and returns a value for GDP (which is the sum of these values). We’ll assign this GDP calculator function to a new object named gdp_calculation:

gdp_calculation<-function(consumption_spending, government_spending, investment_spending, net_exports){
  gdp<-consumption_spending+government_spending+investment_spending+net_exports
  return(gdp)
}

Let’s now test the function; as before, we’ll assume that units are in millions of dollars. We’ll test our function for a country with consumption spending of $125 million (consumption_spending=125), government spending of $66 million (government_spending=66), investment spending of $36 million (investment_spending=36), and net exports of -$33 million (net_exports=-33):

gdp_calculation(consumption_spending = 125, government_spending=66, investment_spending=36, net_exports=-33)
## [1] 194

As expected, the function returns the sum of these values, which translates into 194 (interpreted here as a GDP of $194 million).

Now, let’s say we have consumption spending, government spending, investment spending, and net export data for four different countries, and we want to iteratively apply the gdp_calculation() function over the data for these four countries, and then deposit the resulting GDP values for each of the countries into a list.

The first step is to create a new list of input values, where each list element is a vector that contains the country-level values for each argument of the gdp_calculation() function. We’ll assign this list to a new object named gdp_input_list:

gdp_input_list<-list(consumption_spending=c(44, 89, 64, 33),
                     government_spending=c(54, 76, 222, 110),
                     investment_spending=c(123, 200, 55, 45),
                     net_exports=c(-55, 89, 143,-12))

To make sure we understand what gdp_input_list represents, consider the first element in each of the four numeric vectors in the list; the first element corresponds to the first country, which we can see has consumption spending of $44 million, government spending of $54 million, investment spending of $123 million, and net exports of -$55 million. The second element of each of the vectors in the list corresponds to information for the second country, which has consumption spending of $89 million, government spending of $76 million, investment spending of $200 million, and net exports of $89 million. And so on for Countries 3 and 4.

Now that we have defined our list of input values (gdp_input_list) based on the arguments to the gdp_calculation() function, we can pass gdp_input_list (the list of input values) and gdp_calculation() (the function we are iterating over) as arguments to purrr’s pmap() function, which will iteratively apply the gdp_calculation() function to the inputs specified in gdp_input_list in a vectorized fashion (i.e. the function uses the first value in each vector of the input list to generate the first output value, and then uses the second value in each vector of the input list to generate the second output value etc). We’ll assign the resulting list of output values to a new object named gdp_output_list:

gdp_output_list<-pmap(gdp_input_list, gdp_calculation)

Let’s now print the contents of gdp_output_list:

gdp_output_list
## [[1]]
## [1] 166
## 
## [[2]]
## [1] 454
## 
## [[3]]
## [1] 484
## 
## [[4]]
## [1] 176

As expected, the first list element contains the GDP of the first country, 166 (44+54+123+55); the second list element contains the GDP of the second country, 454 (89+76+200+89); and so on, for the third and fourth countries.

If we want our function outputs in a vector, rather than a list, we can simply use the pmap_dbl() function instead; below, we’ll assign this vector of GDP values to an object named gdp_output_vector:

gdp_output_vector<-pmap_dbl(gdp_input_list, gdp_calculation)

When we print the contents of gdp_output_vector, we can see the GDP values calculated by gdp_calculation() (based on the arguments specified in gdp_input_list) within a vector:

gdp_output_vector
## [1] 166 454 484 176

As an exercise, see if you can write a function that takes arguments for consumption spending, government spending, investment spending, and net exports, and returns a data frame in which these values are columns, along with another column that contains the GDP value. Then, use the pmap_dfr() function to create a new dataframe using the input values contained in gdp_input_list (which we used above).

Your code should yield a table that looks something like this:

##   consumption_spending government_spending investment_spending net_exports gdp
## 1                   44                  54                 123         -55 166
## 2                   89                  76                 200          89 454
## 3                   64                 222                  55         143 484
## 4                   33                 110                  45         -12 176

3 Using functions and iteration in GIS workflows

3.1 Functions, iteration, and spatial joins

In the second assignment, one of your options was to implement a spatial join using traffic stop data for the city of Aurora, compiled by the Stanford Open Policing Project. In particular, the assignment asked you to implement a spatial join with respect to census tracts, compute the total number of stops in each census tract, and then present this information in a data frame.

The script below reviews the code that was required to carry out these tasks:

# Reads in data and assigns to object named "CO_aurora_policestops"
CO_aurora_policestops<-read_csv("https://www.dropbox.com/s/u7xqa7dc34hlsfp/co_aurora_2020_04_01.csv?dl=1")

# Define spatial point object using "CO_aurora_policestops", and assign it to a new object named "co_aurora_sf"
co_aurora_sf<-CO_aurora_policestops %>% 
                  drop_na(lng) %>% 
                  st_as_sf(coords=c("lng", "lat"), crs=4326)

# Project "co_aurora_sf" into the appropriate projection for Aurora, and assign the reprojected data to an object named "co_aurora_sf_project"
co_aurora_sf_project<-co_aurora_sf %>% st_transform(2232)

# Extract census tract data for CO using tidycensus, and assign to a new object named "CO_tracts"
CO_tracts<-get_decennial(geography = "tract",
                         state="CO",
                         variables = "P001001",
                         year = 2010,
                         geometry=TRUE) %>% 
  st_transform(2232)

# Spatially join "CO_tracts" object to "co_aurora_sf_project" object, yielding a new dataset that contains information on the Census tract in which each stop occured; this new dataset is assigned to an object named "stops_tracts_join"
stops_tracts_join<-st_join(co_aurora_sf_project, CO_tracts)

# create a new dataframe containing information on the number of stops for each tract, and assign this data frame to an object named "stops_per_tract"
stops_per_tract<-stops_tracts_join %>% 
  group_by(GEOID, NAME) %>% 
  summarize(n())

# Renames column named "n()" to "traffic_stops
stops_per_tract_final<-stops_per_tract %>% 
                      rename(traffic_stops="n()")

Recall that the assignment also asked you to export the final dataset (here, the data assigned to stops_per_tract_final) as a CSV file. The final CSV file will be less unwieldy if we transform stops_per_tract_final, which is currently an sf object, into a dataframe object, and delete the “geometry” column. We can do so with the following:

# Converts "stops_per_tract_final" object to data frame, deletes "geometry" column, and assigns to a new object named "stops_per_tract_final_df"
stops_per_tract_final_df<-as.data.frame(stops_per_tract_final) %>% 
                          dplyr::select(-geometry)

Finally, we can export stops_per_tract_final_df to our working directory with the following:

write_csv(stops_per_tract_final_df, "census_tracts_stops.csv")

This script allowed us to generate and export a census tract-level dataset of police traffic stops, but let’s say that we want to generate a dataset of traffic stops (using this data) at various other geographies as well (for instance, counties or county subdivisions or census blocks). One strategy to build these datasets at these different geographies is to simply use the script we developed for census tracts, copying and pasting relevant changes as necessary to adapt the code for different geographic boundaries.

A more efficient and scalable approach, however, is to wrap the code we developed into a function that takes a desired geography (from tidycensus) as an argument, and returns a dataset containing information on the number of stops per geographic unit (based on that desired geography). We’ll use the same principles discussed in earlier sections to create this function, which generalizes the spatial join script for census tracts:

# Creates function that takes desired tidycensus geography as argument, and returns data frame of number of traffic stops with respect to that geography; function is assigned to an object named "traffic_stop_geography"
traffic_stop_geography<-function(desired_geography){

geography_extract<-get_decennial(geography = desired_geography,
                                   state="CO",
                                   variables = "P001001",
                                   year = 2010,
                                   geometry=TRUE) %>% 
                          st_transform(2232)
  
stop_geography_join<-st_join(co_aurora_sf_project, geography_extract)

stops_per_geography<-stop_geography_join %>% 
                            group_by(GEOID, NAME) %>% 
                            summarize(n()) %>% 
                            rename(traffic_stops="n()")

stops_per_geography_df<-as.data.frame(stops_per_geography) %>% 
                          dplyr::select(-geometry)

return(stops_per_geography_df)
}

Now that we have our function, let’s go ahead and test it. Let’s generate a dataset containing information on the number of stops within each county subdivision; we’ll assign the dataset to an object named county_subdivision_stops:

county_subdivision_stops<-traffic_stop_geography(desired_geography="county subdivision")

Let’s print the contents of county_subdivision_stops to see what our dataset looks like:

Now, let’s say we want to generate a county-level dataset of stops, a census tract level dataset of stops, and a zip code-level dataset of stops, and then place these datasets into a list. We can do so using the map() function we learned about earlier to iteratively apply the traffic_stop_geography() function to our desired input geographies.

First, we’ll create a character vector that contains the tidycensus codes for the geographies we’d like to use in our spatial joins; we’ll assign this vector to a new object named desired_geography_inputs:

desired_geography_inputs<-c("county", "tract", "zcta")

Now, we’ll use purrr’s map() function to iteratively apply traffic_stop_geography() to the inputs specified in desired_geography_inputs; we’ll assign the output list to a new object named geography_stop_list:

geography_stop_list<-map(desired_geography_inputs, traffic_stop_geography)

Let’s go ahead and print our list:

geography_stop_list
## [[1]]
##   GEOID                      NAME traffic_stops
## 1 08001    Adams County, Colorado         22489
## 2 08005 Arapahoe County, Colorado        117391
## 3 08013  Boulder County, Colorado             1
## 4 08031   Denver County, Colorado          1244
## 5 08035  Douglas County, Colorado           383
## 
## [[2]]
##           GEOID                                          NAME traffic_stops
## 1   08001007801    Census Tract 78.01, Adams County, Colorado          3108
## 2   08001007802    Census Tract 78.02, Adams County, Colorado          1459
## 3   08001007900       Census Tract 79, Adams County, Colorado          1304
## 4   08001008000       Census Tract 80, Adams County, Colorado          1186
## 5   08001008100       Census Tract 81, Adams County, Colorado          1901
## 6   08001008200       Census Tract 82, Adams County, Colorado          3359
## 7   08001008308    Census Tract 83.08, Adams County, Colorado           895
## 8   08001008309    Census Tract 83.09, Adams County, Colorado          8803
## 9   08001008353    Census Tract 83.53, Adams County, Colorado           447
## 10  08001008401    Census Tract 84.01, Adams County, Colorado             1
## 11  08001008523    Census Tract 85.23, Adams County, Colorado             1
## 12  08001988700     Census Tract 9887, Adams County, Colorado            25
## 13  08005005612 Census Tract 56.12, Arapahoe County, Colorado             1
## 14  08005005628 Census Tract 56.28, Arapahoe County, Colorado             1
## 15  08005005800    Census Tract 58, Arapahoe County, Colorado             1
## 16  08005005951 Census Tract 59.51, Arapahoe County, Colorado             1
## 17  08005006711 Census Tract 67.11, Arapahoe County, Colorado             1
## 18  08005006712 Census Tract 67.12, Arapahoe County, Colorado             2
## 19  08005006854 Census Tract 68.54, Arapahoe County, Colorado           121
## 20  08005006855 Census Tract 68.55, Arapahoe County, Colorado            14
## 21  08005006856 Census Tract 68.56, Arapahoe County, Colorado           122
## 22  08005007104 Census Tract 71.04, Arapahoe County, Colorado          7166
## 23  08005007105 Census Tract 71.05, Arapahoe County, Colorado           698
## 24  08005007106 Census Tract 71.06, Arapahoe County, Colorado           272
## 25  08005007107 Census Tract 71.07, Arapahoe County, Colorado           952
## 26  08005007201 Census Tract 72.01, Arapahoe County, Colorado          2195
## 27  08005007202 Census Tract 72.02, Arapahoe County, Colorado          1168
## 28  08005007301 Census Tract 73.01, Arapahoe County, Colorado          1316
## 29  08005007302 Census Tract 73.02, Arapahoe County, Colorado          5446
## 30  08005007400    Census Tract 74, Arapahoe County, Colorado          1293
## 31  08005007500    Census Tract 75, Arapahoe County, Colorado           510
## 32  08005007600    Census Tract 76, Arapahoe County, Colorado          1038
## 33  08005007702 Census Tract 77.02, Arapahoe County, Colorado          1922
## 34  08005007703 Census Tract 77.03, Arapahoe County, Colorado          1002
## 35  08005007704 Census Tract 77.04, Arapahoe County, Colorado           307
## 36  08005080000   Census Tract 800, Arapahoe County, Colorado          1572
## 37  08005080100   Census Tract 801, Arapahoe County, Colorado          1324
## 38  08005080200   Census Tract 802, Arapahoe County, Colorado          2542
## 39  08005080300   Census Tract 803, Arapahoe County, Colorado          1410
## 40  08005080400   Census Tract 804, Arapahoe County, Colorado          1111
## 41  08005080500   Census Tract 805, Arapahoe County, Colorado          1018
## 42  08005080600   Census Tract 806, Arapahoe County, Colorado           401
## 43  08005080700   Census Tract 807, Arapahoe County, Colorado           943
## 44  08005080800   Census Tract 808, Arapahoe County, Colorado           734
## 45  08005080900   Census Tract 809, Arapahoe County, Colorado          3249
## 46  08005081000   Census Tract 810, Arapahoe County, Colorado          4286
## 47  08005081100   Census Tract 811, Arapahoe County, Colorado          3895
## 48  08005081200   Census Tract 812, Arapahoe County, Colorado           598
## 49  08005081300   Census Tract 813, Arapahoe County, Colorado          3569
## 50  08005081400   Census Tract 814, Arapahoe County, Colorado          2323
## 51  08005081500   Census Tract 815, Arapahoe County, Colorado          3189
## 52  08005081600   Census Tract 816, Arapahoe County, Colorado          2084
## 53  08005081700   Census Tract 817, Arapahoe County, Colorado           389
## 54  08005081800   Census Tract 818, Arapahoe County, Colorado          5590
## 55  08005081900   Census Tract 819, Arapahoe County, Colorado          3641
## 56  08005082000   Census Tract 820, Arapahoe County, Colorado          3252
## 57  08005082100   Census Tract 821, Arapahoe County, Colorado          2796
## 58  08005082200   Census Tract 822, Arapahoe County, Colorado          2216
## 59  08005082300   Census Tract 823, Arapahoe County, Colorado          1511
## 60  08005082400   Census Tract 824, Arapahoe County, Colorado          2295
## 61  08005082500   Census Tract 825, Arapahoe County, Colorado           225
## 62  08005082600   Census Tract 826, Arapahoe County, Colorado          2786
## 63  08005082700   Census Tract 827, Arapahoe County, Colorado           698
## 64  08005082800   Census Tract 828, Arapahoe County, Colorado          1604
## 65  08005082900   Census Tract 829, Arapahoe County, Colorado          3011
## 66  08005083000   Census Tract 830, Arapahoe County, Colorado           211
## 67  08005083100   Census Tract 831, Arapahoe County, Colorado          1409
## 68  08005083200   Census Tract 832, Arapahoe County, Colorado           146
## 69  08005083300   Census Tract 833, Arapahoe County, Colorado           843
## 70  08005083400   Census Tract 834, Arapahoe County, Colorado           455
## 71  08005083500   Census Tract 835, Arapahoe County, Colorado           703
## 72  08005083600   Census Tract 836, Arapahoe County, Colorado          6238
## 73  08005083700   Census Tract 837, Arapahoe County, Colorado           134
## 74  08005083800   Census Tract 838, Arapahoe County, Colorado          2676
## 75  08005083900   Census Tract 839, Arapahoe County, Colorado          3919
## 76  08005084000   Census Tract 840, Arapahoe County, Colorado           150
## 77  08005084100   Census Tract 841, Arapahoe County, Colorado          1432
## 78  08005084200   Census Tract 842, Arapahoe County, Colorado           627
## 79  08005084300   Census Tract 843, Arapahoe County, Colorado           334
## 80  08005084400   Census Tract 844, Arapahoe County, Colorado           154
## 81  08005084500   Census Tract 845, Arapahoe County, Colorado           262
## 82  08005084600   Census Tract 846, Arapahoe County, Colorado          1375
## 83  08005084700   Census Tract 847, Arapahoe County, Colorado          1904
## 84  08005084800   Census Tract 848, Arapahoe County, Colorado            30
## 85  08005084900   Census Tract 849, Arapahoe County, Colorado            90
## 86  08005085000   Census Tract 850, Arapahoe County, Colorado           114
## 87  08005085100   Census Tract 851, Arapahoe County, Colorado             7
## 88  08005085200   Census Tract 852, Arapahoe County, Colorado            86
## 89  08005085300   Census Tract 853, Arapahoe County, Colorado           453
## 90  08005085400   Census Tract 854, Arapahoe County, Colorado             8
## 91  08005085500   Census Tract 855, Arapahoe County, Colorado            17
## 92  08005085600   Census Tract 856, Arapahoe County, Colorado             6
## 93  08005085700   Census Tract 857, Arapahoe County, Colorado           671
## 94  08005085800   Census Tract 858, Arapahoe County, Colorado           512
## 95  08005085900   Census Tract 859, Arapahoe County, Colorado           205
## 96  08005086000   Census Tract 860, Arapahoe County, Colorado            29
## 97  08005086100   Census Tract 861, Arapahoe County, Colorado            12
## 98  08005086300   Census Tract 863, Arapahoe County, Colorado           191
## 99  08005086400   Census Tract 864, Arapahoe County, Colorado            21
## 100 08005086500   Census Tract 865, Arapahoe County, Colorado          1074
## 101 08005086600   Census Tract 866, Arapahoe County, Colorado           426
## 102 08005086700   Census Tract 867, Arapahoe County, Colorado           174
## 103 08005086800   Census Tract 868, Arapahoe County, Colorado            23
## 104 08005086900   Census Tract 869, Arapahoe County, Colorado            21
## 105 08005087000   Census Tract 870, Arapahoe County, Colorado           900
## 106 08005087100   Census Tract 871, Arapahoe County, Colorado           382
## 107 08005087200   Census Tract 872, Arapahoe County, Colorado             4
## 108 08005087300   Census Tract 873, Arapahoe County, Colorado             1
## 109 08013012607 Census Tract 126.07, Boulder County, Colorado             1
## 110 08031001701   Census Tract 17.01, Denver County, Colorado             3
## 111 08031002403   Census Tract 24.03, Denver County, Colorado             1
## 112 08031002601   Census Tract 26.01, Denver County, Colorado             1
## 113 08031002703   Census Tract 27.03, Denver County, Colorado             3
## 114 08031002801   Census Tract 28.01, Denver County, Colorado             3
## 115 08031002803   Census Tract 28.03, Denver County, Colorado             2
## 116 08031002902   Census Tract 29.02, Denver County, Colorado             1
## 117 08031003003   Census Tract 30.03, Denver County, Colorado             6
## 118 08031003004   Census Tract 30.04, Denver County, Colorado             3
## 119 08031003102   Census Tract 31.02, Denver County, Colorado             3
## 120 08031003201   Census Tract 32.01, Denver County, Colorado             9
## 121 08031003202   Census Tract 32.02, Denver County, Colorado             3
## 122 08031003401   Census Tract 34.01, Denver County, Colorado             8
## 123 08031003402   Census Tract 34.02, Denver County, Colorado             1
## 124 08031004102   Census Tract 41.02, Denver County, Colorado             2
## 125 08031004106   Census Tract 41.06, Denver County, Colorado           195
## 126 08031004107   Census Tract 41.07, Denver County, Colorado            26
## 127 08031004301   Census Tract 43.01, Denver County, Colorado             2
## 128 08031004403   Census Tract 44.03, Denver County, Colorado            35
## 129 08031004404   Census Tract 44.04, Denver County, Colorado            15
## 130 08031004405   Census Tract 44.05, Denver County, Colorado            63
## 131 08031006804   Census Tract 68.04, Denver County, Colorado             1
## 132 08031006811   Census Tract 68.11, Denver County, Colorado             1
## 133 08031006813   Census Tract 68.13, Denver County, Colorado            30
## 134 08031006814   Census Tract 68.14, Denver County, Colorado             6
## 135 08031007006   Census Tract 70.06, Denver County, Colorado           164
## 136 08031007037   Census Tract 70.37, Denver County, Colorado           613
## 137 08031007088   Census Tract 70.88, Denver County, Colorado             1
## 138 08031007089   Census Tract 70.89, Denver County, Colorado             3
## 139 08031008304   Census Tract 83.04, Denver County, Colorado             2
## 140 08031008312   Census Tract 83.12, Denver County, Colorado            15
## 141 08031008388   Census Tract 83.88, Denver County, Colorado             4
## 142 08031008389   Census Tract 83.89, Denver County, Colorado             4
## 143 08031008390   Census Tract 83.90, Denver County, Colorado             4
## 144 08031008391   Census Tract 83.91, Denver County, Colorado             1
## 145 08031980000    Census Tract 9800, Denver County, Colorado            10
## 146 08035013901 Census Tract 139.01, Douglas County, Colorado           381
## 147 08035013904 Census Tract 139.04, Douglas County, Colorado             1
## 148 08035014007 Census Tract 140.07, Douglas County, Colorado             1
## 
## [[3]]
##      GEOID                  NAME traffic_stops
## 1  0880010 ZCTA5 80010, Colorado         20176
## 2  0880011 ZCTA5 80011, Colorado         36835
## 3  0880012 ZCTA5 80012, Colorado         22681
## 4  0880013 ZCTA5 80013, Colorado         12558
## 5  0880014 ZCTA5 80014, Colorado         16634
## 6  0880015 ZCTA5 80015, Colorado          9762
## 7  0880016 ZCTA5 80016, Colorado          4154
## 8  0880017 ZCTA5 80017, Colorado         15986
## 9  0880018 ZCTA5 80018, Colorado           264
## 10 0880019 ZCTA5 80019, Colorado           175
## 11 0880022 ZCTA5 80022, Colorado             1
## 12 0880045 ZCTA5 80045, Colorado           925
## 13 0880102 ZCTA5 80102, Colorado             1
## 14 0880111 ZCTA5 80111, Colorado             4
## 15 0880112 ZCTA5 80112, Colorado           123
## 16 0880113 ZCTA5 80113, Colorado             2
## 17 0880121 ZCTA5 80121, Colorado             1
## 18 0880122 ZCTA5 80122, Colorado             1
## 19 0880137 ZCTA5 80137, Colorado             2
## 20 0880138 ZCTA5 80138, Colorado            16
## 21 0880202 ZCTA5 80202, Colorado             3
## 22 0880203 ZCTA5 80203, Colorado             2
## 23 0880205 ZCTA5 80205, Colorado             1
## 24 0880206 ZCTA5 80206, Colorado             3
## 25 0880207 ZCTA5 80207, Colorado             2
## 26 0880209 ZCTA5 80209, Colorado            11
## 27 0880210 ZCTA5 80210, Colorado            10
## 28 0880218 ZCTA5 80218, Colorado            17
## 29 0880220 ZCTA5 80220, Colorado            82
## 30 0880230 ZCTA5 80230, Colorado            61
## 31 0880231 ZCTA5 80231, Colorado            44
## 32 0880237 ZCTA5 80237, Colorado             1
## 33 0880238 ZCTA5 80238, Colorado           189
## 34 0880239 ZCTA5 80239, Colorado            29
## 35 0880247 ZCTA5 80247, Colorado           721
## 36 0880249 ZCTA5 80249, Colorado            19
## 37 0880303 ZCTA5 80303, Colorado             1
## 38    <NA>                  <NA>            25

Now, we can easily extract datasets for any of the specified geographies from our list, using the relevant index. For example, let’s say we want to extract the county-level dataset; we can do so using index number 1 (since “county” is the first element in the desired_geography_inputs vector):

geography_stop_list[[1]]

Recall that we can also use the pluck() function to extract list elements; for example, let’s say we want to extract the zip-code level dataset of police stops from geography_stop_list. We can do so with the following:

geography_stop_list %>% pluck(3)

It may also be convenient to assign list elements to their own objects; for instance, if we wanted to extract the zip code dataset from geography_stop_list, and assign it to its own object (here, named police_stops_zip_code), we could do the following:

police_stops_zip_code<-geography_stop_list %>% pluck(3)

It is often convenient to explicitly name our list elements. For example, the code below uses the names() function to name the elements of geography_stop_list based on the desired_geography_inputs vector:

names(geography_stop_list)<-desired_geography_inputs

Now, when we print geography_stop_list, we can note that the list elements are named after the geography to which the dataset corresponds.

geography_stop_list
## $county
##   GEOID                      NAME traffic_stops
## 1 08001    Adams County, Colorado         22489
## 2 08005 Arapahoe County, Colorado        117391
## 3 08013  Boulder County, Colorado             1
## 4 08031   Denver County, Colorado          1244
## 5 08035  Douglas County, Colorado           383
## 
## $tract
##           GEOID                                          NAME traffic_stops
## 1   08001007801    Census Tract 78.01, Adams County, Colorado          3108
## 2   08001007802    Census Tract 78.02, Adams County, Colorado          1459
## 3   08001007900       Census Tract 79, Adams County, Colorado          1304
## 4   08001008000       Census Tract 80, Adams County, Colorado          1186
## 5   08001008100       Census Tract 81, Adams County, Colorado          1901
## 6   08001008200       Census Tract 82, Adams County, Colorado          3359
## 7   08001008308    Census Tract 83.08, Adams County, Colorado           895
## 8   08001008309    Census Tract 83.09, Adams County, Colorado          8803
## 9   08001008353    Census Tract 83.53, Adams County, Colorado           447
## 10  08001008401    Census Tract 84.01, Adams County, Colorado             1
## 11  08001008523    Census Tract 85.23, Adams County, Colorado             1
## 12  08001988700     Census Tract 9887, Adams County, Colorado            25
## 13  08005005612 Census Tract 56.12, Arapahoe County, Colorado             1
## 14  08005005628 Census Tract 56.28, Arapahoe County, Colorado             1
## 15  08005005800    Census Tract 58, Arapahoe County, Colorado             1
## 16  08005005951 Census Tract 59.51, Arapahoe County, Colorado             1
## 17  08005006711 Census Tract 67.11, Arapahoe County, Colorado             1
## 18  08005006712 Census Tract 67.12, Arapahoe County, Colorado             2
## 19  08005006854 Census Tract 68.54, Arapahoe County, Colorado           121
## 20  08005006855 Census Tract 68.55, Arapahoe County, Colorado            14
## 21  08005006856 Census Tract 68.56, Arapahoe County, Colorado           122
## 22  08005007104 Census Tract 71.04, Arapahoe County, Colorado          7166
## 23  08005007105 Census Tract 71.05, Arapahoe County, Colorado           698
## 24  08005007106 Census Tract 71.06, Arapahoe County, Colorado           272
## 25  08005007107 Census Tract 71.07, Arapahoe County, Colorado           952
## 26  08005007201 Census Tract 72.01, Arapahoe County, Colorado          2195
## 27  08005007202 Census Tract 72.02, Arapahoe County, Colorado          1168
## 28  08005007301 Census Tract 73.01, Arapahoe County, Colorado          1316
## 29  08005007302 Census Tract 73.02, Arapahoe County, Colorado          5446
## 30  08005007400    Census Tract 74, Arapahoe County, Colorado          1293
## 31  08005007500    Census Tract 75, Arapahoe County, Colorado           510
## 32  08005007600    Census Tract 76, Arapahoe County, Colorado          1038
## 33  08005007702 Census Tract 77.02, Arapahoe County, Colorado          1922
## 34  08005007703 Census Tract 77.03, Arapahoe County, Colorado          1002
## 35  08005007704 Census Tract 77.04, Arapahoe County, Colorado           307
## 36  08005080000   Census Tract 800, Arapahoe County, Colorado          1572
## 37  08005080100   Census Tract 801, Arapahoe County, Colorado          1324
## 38  08005080200   Census Tract 802, Arapahoe County, Colorado          2542
## 39  08005080300   Census Tract 803, Arapahoe County, Colorado          1410
## 40  08005080400   Census Tract 804, Arapahoe County, Colorado          1111
## 41  08005080500   Census Tract 805, Arapahoe County, Colorado          1018
## 42  08005080600   Census Tract 806, Arapahoe County, Colorado           401
## 43  08005080700   Census Tract 807, Arapahoe County, Colorado           943
## 44  08005080800   Census Tract 808, Arapahoe County, Colorado           734
## 45  08005080900   Census Tract 809, Arapahoe County, Colorado          3249
## 46  08005081000   Census Tract 810, Arapahoe County, Colorado          4286
## 47  08005081100   Census Tract 811, Arapahoe County, Colorado          3895
## 48  08005081200   Census Tract 812, Arapahoe County, Colorado           598
## 49  08005081300   Census Tract 813, Arapahoe County, Colorado          3569
## 50  08005081400   Census Tract 814, Arapahoe County, Colorado          2323
## 51  08005081500   Census Tract 815, Arapahoe County, Colorado          3189
## 52  08005081600   Census Tract 816, Arapahoe County, Colorado          2084
## 53  08005081700   Census Tract 817, Arapahoe County, Colorado           389
## 54  08005081800   Census Tract 818, Arapahoe County, Colorado          5590
## 55  08005081900   Census Tract 819, Arapahoe County, Colorado          3641
## 56  08005082000   Census Tract 820, Arapahoe County, Colorado          3252
## 57  08005082100   Census Tract 821, Arapahoe County, Colorado          2796
## 58  08005082200   Census Tract 822, Arapahoe County, Colorado          2216
## 59  08005082300   Census Tract 823, Arapahoe County, Colorado          1511
## 60  08005082400   Census Tract 824, Arapahoe County, Colorado          2295
## 61  08005082500   Census Tract 825, Arapahoe County, Colorado           225
## 62  08005082600   Census Tract 826, Arapahoe County, Colorado          2786
## 63  08005082700   Census Tract 827, Arapahoe County, Colorado           698
## 64  08005082800   Census Tract 828, Arapahoe County, Colorado          1604
## 65  08005082900   Census Tract 829, Arapahoe County, Colorado          3011
## 66  08005083000   Census Tract 830, Arapahoe County, Colorado           211
## 67  08005083100   Census Tract 831, Arapahoe County, Colorado          1409
## 68  08005083200   Census Tract 832, Arapahoe County, Colorado           146
## 69  08005083300   Census Tract 833, Arapahoe County, Colorado           843
## 70  08005083400   Census Tract 834, Arapahoe County, Colorado           455
## 71  08005083500   Census Tract 835, Arapahoe County, Colorado           703
## 72  08005083600   Census Tract 836, Arapahoe County, Colorado          6238
## 73  08005083700   Census Tract 837, Arapahoe County, Colorado           134
## 74  08005083800   Census Tract 838, Arapahoe County, Colorado          2676
## 75  08005083900   Census Tract 839, Arapahoe County, Colorado          3919
## 76  08005084000   Census Tract 840, Arapahoe County, Colorado           150
## 77  08005084100   Census Tract 841, Arapahoe County, Colorado          1432
## 78  08005084200   Census Tract 842, Arapahoe County, Colorado           627
## 79  08005084300   Census Tract 843, Arapahoe County, Colorado           334
## 80  08005084400   Census Tract 844, Arapahoe County, Colorado           154
## 81  08005084500   Census Tract 845, Arapahoe County, Colorado           262
## 82  08005084600   Census Tract 846, Arapahoe County, Colorado          1375
## 83  08005084700   Census Tract 847, Arapahoe County, Colorado          1904
## 84  08005084800   Census Tract 848, Arapahoe County, Colorado            30
## 85  08005084900   Census Tract 849, Arapahoe County, Colorado            90
## 86  08005085000   Census Tract 850, Arapahoe County, Colorado           114
## 87  08005085100   Census Tract 851, Arapahoe County, Colorado             7
## 88  08005085200   Census Tract 852, Arapahoe County, Colorado            86
## 89  08005085300   Census Tract 853, Arapahoe County, Colorado           453
## 90  08005085400   Census Tract 854, Arapahoe County, Colorado             8
## 91  08005085500   Census Tract 855, Arapahoe County, Colorado            17
## 92  08005085600   Census Tract 856, Arapahoe County, Colorado             6
## 93  08005085700   Census Tract 857, Arapahoe County, Colorado           671
## 94  08005085800   Census Tract 858, Arapahoe County, Colorado           512
## 95  08005085900   Census Tract 859, Arapahoe County, Colorado           205
## 96  08005086000   Census Tract 860, Arapahoe County, Colorado            29
## 97  08005086100   Census Tract 861, Arapahoe County, Colorado            12
## 98  08005086300   Census Tract 863, Arapahoe County, Colorado           191
## 99  08005086400   Census Tract 864, Arapahoe County, Colorado            21
## 100 08005086500   Census Tract 865, Arapahoe County, Colorado          1074
## 101 08005086600   Census Tract 866, Arapahoe County, Colorado           426
## 102 08005086700   Census Tract 867, Arapahoe County, Colorado           174
## 103 08005086800   Census Tract 868, Arapahoe County, Colorado            23
## 104 08005086900   Census Tract 869, Arapahoe County, Colorado            21
## 105 08005087000   Census Tract 870, Arapahoe County, Colorado           900
## 106 08005087100   Census Tract 871, Arapahoe County, Colorado           382
## 107 08005087200   Census Tract 872, Arapahoe County, Colorado             4
## 108 08005087300   Census Tract 873, Arapahoe County, Colorado             1
## 109 08013012607 Census Tract 126.07, Boulder County, Colorado             1
## 110 08031001701   Census Tract 17.01, Denver County, Colorado             3
## 111 08031002403   Census Tract 24.03, Denver County, Colorado             1
## 112 08031002601   Census Tract 26.01, Denver County, Colorado             1
## 113 08031002703   Census Tract 27.03, Denver County, Colorado             3
## 114 08031002801   Census Tract 28.01, Denver County, Colorado             3
## 115 08031002803   Census Tract 28.03, Denver County, Colorado             2
## 116 08031002902   Census Tract 29.02, Denver County, Colorado             1
## 117 08031003003   Census Tract 30.03, Denver County, Colorado             6
## 118 08031003004   Census Tract 30.04, Denver County, Colorado             3
## 119 08031003102   Census Tract 31.02, Denver County, Colorado             3
## 120 08031003201   Census Tract 32.01, Denver County, Colorado             9
## 121 08031003202   Census Tract 32.02, Denver County, Colorado             3
## 122 08031003401   Census Tract 34.01, Denver County, Colorado             8
## 123 08031003402   Census Tract 34.02, Denver County, Colorado             1
## 124 08031004102   Census Tract 41.02, Denver County, Colorado             2
## 125 08031004106   Census Tract 41.06, Denver County, Colorado           195
## 126 08031004107   Census Tract 41.07, Denver County, Colorado            26
## 127 08031004301   Census Tract 43.01, Denver County, Colorado             2
## 128 08031004403   Census Tract 44.03, Denver County, Colorado            35
## 129 08031004404   Census Tract 44.04, Denver County, Colorado            15
## 130 08031004405   Census Tract 44.05, Denver County, Colorado            63
## 131 08031006804   Census Tract 68.04, Denver County, Colorado             1
## 132 08031006811   Census Tract 68.11, Denver County, Colorado             1
## 133 08031006813   Census Tract 68.13, Denver County, Colorado            30
## 134 08031006814   Census Tract 68.14, Denver County, Colorado             6
## 135 08031007006   Census Tract 70.06, Denver County, Colorado           164
## 136 08031007037   Census Tract 70.37, Denver County, Colorado           613
## 137 08031007088   Census Tract 70.88, Denver County, Colorado             1
## 138 08031007089   Census Tract 70.89, Denver County, Colorado             3
## 139 08031008304   Census Tract 83.04, Denver County, Colorado             2
## 140 08031008312   Census Tract 83.12, Denver County, Colorado            15
## 141 08031008388   Census Tract 83.88, Denver County, Colorado             4
## 142 08031008389   Census Tract 83.89, Denver County, Colorado             4
## 143 08031008390   Census Tract 83.90, Denver County, Colorado             4
## 144 08031008391   Census Tract 83.91, Denver County, Colorado             1
## 145 08031980000    Census Tract 9800, Denver County, Colorado            10
## 146 08035013901 Census Tract 139.01, Douglas County, Colorado           381
## 147 08035013904 Census Tract 139.04, Douglas County, Colorado             1
## 148 08035014007 Census Tract 140.07, Douglas County, Colorado             1
## 
## $zcta
##      GEOID                  NAME traffic_stops
## 1  0880010 ZCTA5 80010, Colorado         20176
## 2  0880011 ZCTA5 80011, Colorado         36835
## 3  0880012 ZCTA5 80012, Colorado         22681
## 4  0880013 ZCTA5 80013, Colorado         12558
## 5  0880014 ZCTA5 80014, Colorado         16634
## 6  0880015 ZCTA5 80015, Colorado          9762
## 7  0880016 ZCTA5 80016, Colorado          4154
## 8  0880017 ZCTA5 80017, Colorado         15986
## 9  0880018 ZCTA5 80018, Colorado           264
## 10 0880019 ZCTA5 80019, Colorado           175
## 11 0880022 ZCTA5 80022, Colorado             1
## 12 0880045 ZCTA5 80045, Colorado           925
## 13 0880102 ZCTA5 80102, Colorado             1
## 14 0880111 ZCTA5 80111, Colorado             4
## 15 0880112 ZCTA5 80112, Colorado           123
## 16 0880113 ZCTA5 80113, Colorado             2
## 17 0880121 ZCTA5 80121, Colorado             1
## 18 0880122 ZCTA5 80122, Colorado             1
## 19 0880137 ZCTA5 80137, Colorado             2
## 20 0880138 ZCTA5 80138, Colorado            16
## 21 0880202 ZCTA5 80202, Colorado             3
## 22 0880203 ZCTA5 80203, Colorado             2
## 23 0880205 ZCTA5 80205, Colorado             1
## 24 0880206 ZCTA5 80206, Colorado             3
## 25 0880207 ZCTA5 80207, Colorado             2
## 26 0880209 ZCTA5 80209, Colorado            11
## 27 0880210 ZCTA5 80210, Colorado            10
## 28 0880218 ZCTA5 80218, Colorado            17
## 29 0880220 ZCTA5 80220, Colorado            82
## 30 0880230 ZCTA5 80230, Colorado            61
## 31 0880231 ZCTA5 80231, Colorado            44
## 32 0880237 ZCTA5 80237, Colorado             1
## 33 0880238 ZCTA5 80238, Colorado           189
## 34 0880239 ZCTA5 80239, Colorado            29
## 35 0880247 ZCTA5 80247, Colorado           721
## 36 0880249 ZCTA5 80249, Colorado            19
## 37 0880303 ZCTA5 80303, Colorado             1
## 38    <NA>                  <NA>            25

Now that our list is named, we can use these names to extract list elements. For example, if we wanted to extract the zip code dataset, we could use the following:

geography_stop_list[["zcta"]]

Even after naming the list, we can continue referring to the list elements by their indices as well. For example:

geography_stop_list[[3]]

Now that we have all of our desired datasets stored in geography_stop_list, we can go ahead and write each of these datasets out to disk. To do so, we can first write a function that will take a given file and desired file name as arguments, and save this file to disk (to the working directory) with the given file name. We’ll assign this function to a new object named output_csv():

output_csv<-function(file, name){
  filename<-paste0(name, ".csv")
  write_csv(file, filename)
}

Note that in the function above, the part of the function’s body that reads filename<-paste0(name, ".csv") simply appends the desired file extension to the file name that is supplied as an argument.

You can go ahead and test that this function works for a single file; for instance, let’s try using this function to write out the zip code dataset, which we earlier assigned to police_stops_zip_code. We’ll name the file “zip_code_data”:

# Test function
output_csv(police_stops_zip_code, "zip_code_data")

If you check your working directory, you should see a CSV file containing the police_stops_zip_code data, saved as “zip_code_data.csv”.

Now that we’ve written a function that can write out a specified CSV file with a specified name, let’s go ahead and apply it iteratively to all of the list elements in geography_stop_list so that each dataset is written out to disk individually. To do so, we’ll use a variant of the walk() function, which works much the same as a map() function; the key difference is that when we use walk() (or a variant of walk()), we are not interested in the return value of the function, but rather in the action it performs (i.e transferring data to disk). As a general rule, if you are interested in iteratively applying a function to read in data, or to write out data, you should use a walk() function rather than map().

Because the output_csv() function has two arguments, we’ll use the walk2() function. Below, the first argument, geography_stop_list, contains the object that we’d like to write out to disk. The second argument, desired_geography_inputs, is the character vector we defined earlier, which contains the names of the list elements in geography_stop_list. We’ll use these names as our file names. Finally, output_csv() is the function we would like to apply:

walk2(geography_stop_list, desired_geography_inputs, output_csv)

The code above takes the first element of geography_stop_list, and writes this dataset out to the working directory as “county.csv” (the “county” is the first element of desired_geography_inputs and the “.csv” is appended within the output_csv() function’s body); it then takes the second element of geography_stop_list, and writes this dataset to the working directory as “tract.csv”; finally, it takes the third element of geography_stop_list and writes this dataset to the working directory as “zcta.csv”.

After running the code above, you should check your working directory to confirm that all of the datasets have been saved, and that the filenames are correct and correspond to the appropriate data.

3.2 Functions, iteration, and map creation

In the first lesson, we learned how to make a basic world choropleth map based on World Bank development indicator data extracted from the WDI package. Let’s first reproduce the script from that lesson, which generated a world choropleth map of trade as a percentage of GDP in the year 2015:

# Extract country boundaries (sans Antarctica) as an sf object using "ne_countries" function, and assign to object named "country_boundaries"
country_boundaries<-ne_countries(scale="medium", returnclass="sf") %>% 
                       filter(iso_a3 !="ATA")

# Extract cross-sectional data on trade as a percentage of GDP for the year 2015, using WDI package, and then rename the column named "NE.TRD.GNFS.ZS" to "trade_gdp_2015"; dataset is assigned to new object named "trade_gdp_2015"
trade_gdp_2015<-WDI(country="all",
                    indicator="NE.TRD.GNFS.ZS",
                    start=2015, 
                    end=2015, 
                    extra=T) %>% 
               rename(trade_gdp_2015=NE.TRD.GNFS.ZS)

# Joins "trade_gdp_2015" object to "country_boundaries" object, based on 3-digit ISO code, and assigns joined dataset to new object named "trade_2015_spatial"
trade_2015_spatial<-full_join(country_boundaries, trade_gdp_2015,
                                    by=c("iso_a3"="iso3c"))

# Uses tmap package to create map based on "trade_gdp_2015" column in "trade_2015_spatial" object and assigns result to object named "trade_2015"
trade_2015<-tm_shape(trade_2015_spatial)+
               tm_polygons(col="trade_gdp_2015",
                          n=7,
                          style="quantile",
                          palette="YlOrBr",
                          title="Trade as a % of GDP,\n2015",
                          textNA="No Data")+
            tm_layout(legend.outside=T,
                      legend.outside.position = "bottom",
                      main.title="Crossnational Variation in Commercial Integration, 2015",
                      main.title.size=1,
                      main.title.position="center",
                      inner.margins=c(0.06,0.10,0.10,0.08), # Sets margins to create whitespace
                      frame=FALSE,
                      attr.outside = TRUE)+ # Places credits section outside map
            tm_credits("Map Author: Aditya Ranganath\nData Source: World Bank\nDevelopment Indicators (WDI)", position=c(0.78,0), size=0.38) # Specifies content, position, size of credits

# prints "trade_2015"
trade_2015
## Warning: The shape trade_2015_spatial contains empty units.

Let’s say that for the project you are working on, you expect you’ll have to make a bunch of different world choropleth maps for different World Bank indicators. In order to automate this process, we can write a function that generalizes the code for a single map. Below, we’ll assign this function to an object named wdi_map_maker. This function takes the WDI variable code (“wdi_variable_code”), the desired start year for the WDI data (“start_year”), the end year for the WDI data (“end_year”), the desired legend title (“legend_title”), and the desired title for the map (“main_map_title”) as inputs, and returns a choropleth map based on these parameters:

wdi_map_maker<-function(wdi_variable_code, start_year, end_year, 
                        legend_title, main_map_title){
country_boundaries<-ne_countries(scale="medium", returnclass="sf") %>% 
                       filter(iso_a3 !="ATA")
  
wdi_extract<-WDI(country="all",
              indicator=wdi_variable_code,
              start=start_year,
              end=end_year,
              extra=T)

spatial_object_tomap<-inner_join(country_boundaries, wdi_extract,
                                    by=c("iso_a3"="iso3c"))
final_map<-tm_shape(spatial_object_tomap)+
  tm_polygons(col=wdi_variable_code,
              n=7,
              style="quantile",
              palette="YlOrBr",
              title=legend_title,
              textNA="No Data")+
    tm_layout(legend.outside=T,
              legend.outside.position = "bottom",
              main.title=main_map_title,
              main.title.size=1,
              main.title.position="center",
              inner.margins=c(0.06,0.10,0.10,0.08), # Sets margins to create whitespace
              frame=FALSE,
              attr.outside = TRUE)+ # Places credits section outside map
   tm_credits("Map Author: Aditya Ranganath\nData Source: World Bank\nDevelopment Indicators (WDI)", position=c(0.78,0), size=0.38) # Specifies content, position, size of credits
return(final_map)
}

Let’s now test this function; let’s say we want to make a map of services trade as a percentage of GDP (the WDI variable code for this indicator is “BG.GSR.NFSV.GD.ZS”), from the year 2017, with a legend title of “Services Trade (% of GDP)”, and a main map title of “Service Market Integration, 2017”. Let’s pass these arguments to the wdi_map_maker() function, and see if it works:

wdi_map_maker(wdi_variable_code="BG.GSR.NFSV.GD.ZS", start_year=2017, end_year=2017, 
                        legend_title="Services Trade (% of GDP)", main_map_title="Service Market Integration, 2017")

Let’s try another one. Let’s say we want to make a choropleth map of government taxes as a share of GDP (WDI code for this variable is “GC.TAX.TOTL.GD.ZS”) for the year 2017, and that we want to set the legend title as “Taxes (% of GDP)”, and the main title of the map as “Government Taxes as a Share of GDP, 2017”. Let’s plug those parameters into the function, and generate our map:

wdi_map_maker(wdi_variable_code="GC.TAX.TOTL.GD.ZS", start_year=2017, end_year=2017, 
                        legend_title="Taxes (% of GDP)", main_map_title="Government Taxes as a Share of GDP, 2017")

Now, let’s say you want to make several maps, and don’t want to apply wdi_map_maker() multiple times in a manual fashion. Rather, you want to iteratively apply the function to a set of multiple inputs, and generate multiple maps based on those inputs.

For simplicity, we’ll work with the same input parameters that we used individually in the two previous function calls to wdi_map_maker(), and see how we can apply wdi_map_maker() to these inputs in an iterative fashion.

The first step is to create a new list, which we’ll assign to an object named input_list_WDI, whose elements are vectors that contain the arguments we would like to pass as arguments to the wdi_map_maker() function:

input_list_WDI<-list(wdi_variable_code=c("BG.GSR.NFSV.GD.ZS", "GC.TAX.TOTL.GD.ZS"),
                    start_year=c(2017, 2017),
                    end_year=c(2017, 2017),
                    legend_title=c("Services Trade (% of GDP)", "Taxes (%GDP)"),
                    main_map_title=c("Service Market Integration, 2017", "Government Taxes as a Share of GDP, 2017")) 

It may be helpful to print the contents of input_list_WDI, so that we can clearly see its structure:

input_list_WDI
## $wdi_variable_code
## [1] "BG.GSR.NFSV.GD.ZS" "GC.TAX.TOTL.GD.ZS"
## 
## $start_year
## [1] 2017 2017
## 
## $end_year
## [1] 2017 2017
## 
## $legend_title
## [1] "Services Trade (% of GDP)" "Taxes (%GDP)"             
## 
## $main_map_title
## [1] "Service Market Integration, 2017"        
## [2] "Government Taxes as a Share of GDP, 2017"

Now, we can pass input_list_WDI as an argument to the pmap() function, along with wdi_map_maker(), to iteratively apply the wdi_map_maker() function using the arguments specified in input_list_WDI. The pmap() function returns a list containing the function’s outputs (in this case, world choropleth maps for different WDI variables), which we’ll assign to a new object named WDI_map_list:

WDI_map_list<-pmap(input_list_WDI, wdi_map_maker)

Let’s break down the code above, so that we understand how the pmap() function is working in this case:

  • First, it runs the wdi_map_maker() function using the first element of the wdi_variable_code vector (“BG.GSR.NFSV.GD.ZS”) in input_list_WDI, the first element of the start_year vector (“2017”), the first element of the “end_year” vector (“2017”), the first element of the legend_title vector (“Services Trade (% of GDP)”), and the first element of the main_map_title vector (“Service Market Integration, 2017”) as arguments. It deposits the map created using these arguments into the output list, WDI_map_list, as the first element.
  • Second, it runs the wdi_map_maker() function using the second element of the wdi_variable_code vector (“GC.TAX.TOTL.GD.ZS”) in input_list_WDI, the second element of the start_year vector (“2017”), the second element of the “end_year” vector (“2017”), the second element of the legend_title vector (“Taxes (%GDP)”), and the second element of the main_map_title vector (“Government Taxes as a Share of GDP, 2017”) as arguments. It deposits the map created using these arguments into the output list, WDI_map_list, as the second element.

Now, let’s print the contents of WDI_map_list, and confirm that our maps appear as expected:

WDI_map_list
## [[1]]

## 
## [[2]]

Recall that we can extract our list elements by specifying the desired element’s index with Base R’s double bracket syntax. For instance, if we wanted to extract the first element from WDI_map_list (the 2017 service market integration map), we could use the following:

WDI_map_list[[1]]

Recall, also, that we can use the pluck() function to extract list elements. For instance, if we want to extract the second element in WDI_map_list (the map of taxes as a share of GDP in 2017), we can use the following:

WDI_map_list %>% pluck(2)

If we want to name our list elements, we can do so using the names() function. Let’s say we want to name the list elements according to the WDI variable code that specifies the data which is being mapped. We can extract these variable codes from input_list_WDI with the following:

input_list_WDI$wdi_variable_code
## [1] "BG.GSR.NFSV.GD.ZS" "GC.TAX.TOTL.GD.ZS"

Now, we can assign these codes to the elements in WDI_map_list with the following:

names(WDI_map_list)<-input_list_WDI$wdi_variable_code

Lets print the contents of WDI_map_list, and confirm that the elements are now associated with the relevant variable codes:

WDI_map_list
## $BG.GSR.NFSV.GD.ZS

## 
## $GC.TAX.TOTL.GD.ZS

Now, in addition to extracting list elements by their index numbers, we can also do so using the assigned names. For instance, if we wanted to extract the map of taxes as a share of GDP using the pluck() function using its name (“GC.TAX.TOTL.GD.ZS”) rather than its index (2), we could use the following:

WDI_map_list %>% pluck("GC.TAX.TOTL.GD.ZS")

After generating all of your desired maps and depositing those maps into a list, you may wish to export these maps from R Studio, and save them to a local directory on your computer using a common file format (such as PDF, png, jpeg etc.).

If you would like to create a single document which contains all of the maps that are stored in a list, a good option is to use a PDF graphics device. The code below uses a PDF graphics device to write out the choropleth maps in WDI_map_list to the working directory as a 2-page PDF file in which each map is printed on a separate page in the order they appear in WDI_map_list:

pdf("WDI_maps.pdf")
WDI_map_list
## $BG.GSR.NFSV.GD.ZS
## 
## $GC.TAX.TOTL.GD.ZS
dev.off()
## quartz_off_screen 
##                 2

In the code above, the first line, pdf("WDI_maps.pdf") initiates the PDF device, and specifies that the output file name is to be “WDI_maps.pdf”. Then, the second line, WDI_map_list, specifies the name of the list object that contains the maps we’d like to print to our PDF document. Finally, dev.off() turns off the PDF device. At this point, you should check your working directory, and confirm that a PDF file named “WDI_maps.pdf” does indeed exist in your working directory, and that this file does contain the maps in WDI_map_list.

If you would prefer to write out all of the maps within WDI_map_list as separate files (rather than as a single file containing all the maps, as we just did), the process is a bit more involved, but can be implemented using skills we have already developed.

Recall from the first lesson that if we want to write out a single tmap object to disk, we can use the tmap’s tmap_save() function. For instance, let’s say we want to write out the trade_2015 tmap object we created earlier to our working directory, and save this file as a PDF file named “trade_2015_map”. We can do so with the following:

tmap_save(tm=trade_2015,
          filename="trade_2015_map.pdf",
          width=1920,
          height=1080)
## Warning: The shape trade_2015_spatial contains empty units.
## Map saved to /Users/adra7980/Documents/git_repositories/ARSC5040_GIS/trade_2015_map.pdf
## Size: 6.388889 by 3.597222 inches

Above, the tm argument specifies the name of the object we would like to write out to disk (trade_2015), and the filename argument specifies the desired filename for the exported file (trade_2015_map.pdf). The width and height arguments specify the desired dimensions for the exported map. After running the code, you should check your working directory and open the “trade_2015_map.pdf” file to ensure that everything looks in order. Recall that the tmap_save() function is quite flexible; if we had wanted the file saved as a png file instead of a pdf file, for instance, we could have simply used a “.png” extension for the filename rather than a “.pdf” extension (i.e. filename="trade_2015.pdf").

Now that we’ve refreshed our memory about how to export a single tmap object to disk, let’s consider how to export multiple tmap objects, stored in a list, as individual files in an iterative fashion (so that we don’t have to write out each map individually). The first step is to write a function to export tmap objects that generalized the code above. The code below creates a function that takes the name of a map object (map_object), a desired filename (map_file_name), and a desired file extension (extension) as arguments, and exports the map specified by those parameters using the dimensions we used above. It assigns this function to an object named map_export():

map_export<-function(map_object, map_file_name, extension){
  tmap_save(tm=map_object,
            filename=paste0(map_file_name, extension),
            width=1920,
            height=1080)
}

Now, let’s test out the map_export() function for a single map. We’ll write out the trade_2015 object using this function, but write it out as a PNG file instead of a PDF, and name it trade_2015_alternate:

map_export(map_object=trade_2015,
           map_file_name = "trade_2015_alternate",
           extension=".png")
## Warning: The shape trade_2015_spatial contains empty units.
## Map saved to trade_2015_alternate.png
## Resolution: 1920 by 1080 pixels
## Size: 6.4 by 3.6 inches (300 dpi)

Check your working directory to make sure that you see the “trade_2015_alternate.png” file we’ve just written out, and that the map has printed correctly.

Now that we’ve confirmed that the map_export() function is working correctly, let’s use it to iteratively write out the maps in WDI_map_list to disk as separate PDF files. Before proceeding, we need to decide on names for the exported files. For convenience, let’s name the exported files after the WDI variable codes, which is also the name of the list elements in WDI_map_list. Let’s extract these WDI variable code names, and assign this name vector to a new object named wdi_export_names:

wdi_export_names<-names(WDI_map_list)

Now, let’s define a list of inputs and assign it to an object named WDI_map_inputs_list; we’ll use this list to specify the arguments to map_export(). Below, map_object=WDI_map_listspecifies that the *tmap* objects we want to write out are stored inWDI_map_list. The second list element,map_file_name=name_vector, specifies that we want to use the filenames stored in thewdi_export_namesvector we created above as file names. Finally,extension=“.pdf”``` specifies that we want the exported files to be stored in PDF format.

WDI_map_inputs_list<-list(map_object=WDI_map_list,
                          map_file_name=wdi_export_names,
                          extension=".pdf")

Now that we have our input list, we’ll use the pwalk() function to iteratively apply map_export() using the arguments specified in WDI_map_inputs_list. The pwalk() function works in essentially the same way as the pmap() function; the main difference is that the latter returns an actual output, while the walk() family of functions is better-suited to performing a specific action (i.e. reading in or writing out data). Below, the first argument to pwalk() is the list of argument we would like to iteratively apply using the map_export() function (the second argument):

pwalk(WDI_map_inputs_list, map_export)
## Map saved to /Users/adra7980/Documents/git_repositories/ARSC5040_GIS/BG.GSR.NFSV.GD.ZS.pdf
## Size: 6.388889 by 3.597222 inches
## Map saved to /Users/adra7980/Documents/git_repositories/ARSC5040_GIS/GC.TAX.TOTL.GD.ZS.pdf
## Size: 6.388889 by 3.597222 inches

Let’s unpack what the code above is doing:

  • First, it uses the information in WDI_map_inputs_list to extract the first element from WDI_map_list (the map of services integration) , the first element from wdi_export_names (“BG.GSR.NFSV.GD.ZS”), and the “.pdf” character string, and uses these elements as (respectively) the arguments for “map_object”, “map_file_name”, and “extension” in the map_export() function. After the function runs using these arguments, the map of services integration in WDI_map_list is written out to the working directory as “BG.GSR.NFSV.GD.ZS.pdf”.
  • Second, it uses the information in WDI_map_inputs_list to extract the second element from WDI_map_list (the map of taxes as a share of GDP) , the second element from wdi_export_names (“GC.TAX.TOTL.GD.ZS”), and the “.pdf” character string, and uses these elements as (respectively) the arguments for “map_object”, “map_file_name”, and “extension” in the map_export() function. After the function runs using these arguments, the map of taxes as a share of GDP in WDI_map_list is written out to the working directory as “GC.TAX.TOTL.GD.ZS.pdf”.

You may have noticed that we only supplied one character string for the desired extension (i.e. extension=".pdf" in WDI_map_inputs_list), but that there are two maps and two desired file names. It wasn’t necessary to specify “.pdf” twice, because R automatically recycles vector values. Vector recycling refers to the fact that when performing vectorized operations, R requires vectors to be of the same length; when vectors of different lengths are supplied, R will reuse (i.e. “recycle”) the elements of the shorter vector until it matches the length of the longer vector(s). Here, the “.pdf” value used for the first iteration is automatically recycled in the second iteration, which is appropriate since we want both files to be PDF files; if, for example, we wanted the map of taxes as a share of GDP to be saved as a PNG file instead, we would need to specify extension=c(".pdf", ".png") in WDI_map_inputs_list instead of extension=".pdf". Note, also, that we could have also generated two PDF files by specifying extension=c(".pdf", ".pdf"), but given how vector recycling works in R, this was unnecessary.

3.3 Functions and iteration with raster data

In Class 3’s tutorial on rasters, we used a raster dataset of NYC’s population, along with a vector dataset of subway stops, to calculate the percentage of the city’s population that lives within 500 meters of a subway stop. Let’s remind ourselves of what the script we developed to calculate this percentage looked like:

# Read in NYC 2019 Raster
nyc_pop_2019<-raster("nyc_2019_pop.tif")

# Read in NYC subway stop data
nyc_subway_stops<-st_read("stops_nyc_subway_may2019.shp")

# Read in NYC borough data
nyc_boroughs<-st_read("nyu_2451_34490.shp")

# Create 500m subway buffers
nyc_subway_500m_buffer<-st_buffer(nyc_subway_stops, 1640.42)

# Dissolve buffers
nyc_subway_500m_buffer_dissolved<-nyc_subway_500m_buffer %>% 
                                    group_by() %>% 
                                    summarise()

# Transform CRS of "nyc_subway_500m_buffer_dissolved" to match "nyc_pop_2019"
nyc_subway_500m_buffer_dissolved_4326<-nyc_subway_500m_buffer_dissolved %>% 
                                        st_transform(4326)

# Use zonal statistics to calculate population in buffer zone ("nyc_subway_500m_buffer_dissolved_4326") based on "nyc_2019_population" raster
nyc_pop_within_buffer<-exact_extract(nyc_pop_2019, nyc_subway_500m_buffer_dissolved_4326, fun="sum")

# Convert "nyc_boroughs" to WGS84 CRS to match "nyc_pop_2019" CRS
nyc_borough_4326<-nyc_boroughs %>% st_transform(4326)

# Use zonal statistics to calculate total NYC population 
nyc_pop<-sum(exact_extract(nyc_pop_2019, nyc_borough_4326, fun="sum" ))

# Percentage Inside Buffer
pct_inside<-(nyc_pop_within_buffer/nyc_pop)*100

# Calculate Percentage Outside Buffer
pct_outside_buffer<-100-pct_inside

# Print "pct_outside_buffer"
pct_outside_buffer
## [1] 45.07731

Now, let’s say you want to modify this script, calculate the percentage of NYC residents that live more than 1000 meters away from a subway stop. One option is to generate a new set of buffers with a radius of 1000 meters, and then carry out subsequent calculations (that are modeled on the steps above) using these buffers. However, this is fairly tedious, and can quickly become overwhelming if you want to perform calculations for several different distance thresholds. In such circumstances, it is better to write a function that takes a given distance (in meters) as its input argument, and returns the percentage of the city’s population that lives in areas of the city that live more than that distance from a subway stop.

Below, we will write such a function, which is assigned to a new object named nyc_population_buffer. The nyc_population_buffer() function takes a single argument, “buffer_distance_meters”, which indicates the desired distance threshold (in meters) to use in computing the subway stop buffers. The body of the function takes this argument, and calculates the percentage of the city’s population whose proximity to a subway stop is further than this distance threshold. The code used in the function’s body is substantially the same as the script we used above to calculate the the percentage of NYC residents that live further than 500m from a subway stop. Note, however, that we already know (from the script above), the city’s total population (based on the raster dataset and associated zonal statistics calculation) in 2019:

# Prints NYC 2019 population based on zonal stats calculation
nyc_pop
## [1] 8309940

We’ll use this population estimate (8309940) directly in the function’s body, rather than repeating the calculation we’ve already performed. Also, note that instead of returning the percentage of residents beyond the given threshold as a numeric vector, the function returns a data frame, with one column containing the relevant distance threshold (i.e. the input argument), and another column containing the corresponding percentage of city residents whose proximity to the closest subway stop is greater than that distance.

nyc_population_buffer<-function(buffer_distance_meters){
  buffer_distance_feet<-buffer_distance_meters*3.281
  nyc_subway_buffer<-st_buffer(nyc_subway_stops, buffer_distance_feet)
  
  nyc_subway_buffer_dissolved<-nyc_subway_buffer %>% 
                                group_by() %>% 
                                summarise()
  
  nyc_buffer_dissolved_4326<-nyc_subway_buffer_dissolved %>% 
                                st_transform(4326)
  
  nyc_pop_within_buffer<-exact_extract(nyc_pop_2019, nyc_buffer_dissolved_4326, fun="sum")
  
  nyc_inside_buffer_pct<-(nyc_pop_within_buffer/8309940)*100
  
  pct_outside_buffer<-100-nyc_inside_buffer_pct
  
  final_df<-data.frame(buffer_distance_meters, pct_outside_buffer)
  
  return(final_df)
}

Let’s now test the nyc_population_buffer() we’ve just created. Let’s calculate the percentage of NYC’s 2019 population that lived further than 1000m from a subway stop:

nyc_population_buffer(buffer_distance_meters=1000)
##   buffer_distance_meters pct_outside_buffer
## 1                   1000           22.41969

We can see that this value is roughly 22%.

Let’s say we want to calculate the percentage of NYC’s 2019 population that lived further than 650m from a subway stop:

nyc_population_buffer(buffer_distance_meters=650)
##   buffer_distance_meters pct_outside_buffer
## 1                    650           34.34548

We can see that roughly 34% of the population lives further than 650m from a subway stop. It makes sense that this percentage is greater than the percentage of the city’s population that lives >1000m from a subway stop, since the total area outside 650m subway buffers is naturally greater than the total area outside 1000m subway buffers.

Now, let’s say that instead of trying different distance thresholds, one at a time, we want to pass several distance thresholds as arguments to the nyc_population_buffer() function, and create a data frame which contains multiple distance thresholds, and multiple corresponding return values that indicate the percentage of city residents who whose distance to a subway stop exceeds that threshold First, we’ll define a vector of distance thresholds, to which we’ll iteratively apply the nyc_population_buffer() function. We’ll assign this vector to an object named buffer_distance_meters_vector:

buffer_distance_meters_vector<-c(250, 500, 750, 1000, 1250, 1500, 1750, 2000)

Because we want our final results in a data frame, and there is only one argument to the nyc_population_buffer() function, we will use the map_dfr() function to iteratively apply nyc_population_buffer() to the distance thresholds contained in the argument vector, buffer_distance_meters_vector``. We'll assign the resulting data frame to a new object namedsubway_distance_table```:

subway_distance_table<-map_dfr(buffer_distance_meters_vector, nyc_population_buffer)

Above, the first argument to map_dfr() is the vector of arguments we want to iterate over (here, various distance thresholds), and the second argument, nyc_population_buffer(), is the function that we would like to apply.

The code takes the first element of buffer_distance_meters_vector (250), and uses this value as an input argument to nyc_population_buffer(). The result is the first row in the output table (with two columns, one named “buffer_distance_meters_vector” and the other “pct_outside_buffer”, which display the nyc_population_buffer() function’s input and output, respectively); the value of the “buffer_distance_meters” in this first row is 250 (the first element in buffer_distance_meters), and the corresponding value of “pct_outside_buffer” is the percentage of the city population outside a subway buffer zone of that area, as calculated by the nyc_population_buffer() function. It then takes the second element of buffer_distance_meters_vector (500), uses this value as an input argument to nyc_population_buffer(), and deposits the result in the second row of the resulting data frame. The code repeats this process, and appends the function’s subsequent outputs to subsequent data frame rows, until it runs through all of the arguments in the buffer_distance_meters_vector vector; it then assigns the output dataframe to a new object named subway_distance_table.

Let’s open subway_distance_table and see what our final data frame looks like:

subway_distance_table
##   buffer_distance_meters pct_outside_buffer
## 1                    250           78.07705
## 2                    500           45.07518
## 3                    750           29.56809
## 4                   1000           22.41969
## 5                   1250           17.94687
## 6                   1500           14.96516
## 7                   1750           12.77817
## 8                   2000           11.10808

Assembling this table would have been a time-consuming task if we had to modify and rerun our original script eight times. But by writing a function that generalizes that script for a buffer distance of any length, and using basic iteration functions from the purrr package, we were able to carry out this process with greater speed and efficiency.

4 Further reading

If you would like to explore functional programming in R in greater detail, a good place to start is Chapter 19 (“Functions”), Chapter 20 (“Vectors”), and Chapter 21 (“Iteration”), of R For Data Science by Hadley Wickham and Garrett Grolemund. Another useful resource is an (evolving) online book entitled Functional Programming, by Sara Altman, Bill Behrman, and Hadley Wickham. Finally, if you would like to gain more experience with the purrr package and map() functions, Rebecca Barter’s tutorial, entitled "Learn to purrr is an excellent primer. None of these resources are specifically focused on GIS, but the tools and techniques presented and elaborated in these tutorials can be adapted to GIS datasets and workflows, as we have saw in this tutorial’s examples.